Introduction

This report investigates the spatio-temporal patterns of chickenpox cases in Hungary over a period of 10 years. The analysis is based on weekly data for each of Hungary’s 20 counties, covering the period from 2005 to 2015. Chickenpox, a contagious viral infection, can lead to outbreaks in both urban and rural regions. Understanding the epidemiology of chickenpox through temporal and spatial analyses is vital for developing effective public health strategies, including vaccination campaigns.

Objectives:

  1. Explore long-term trends, both nationally and by county.
  2. Examine intra-annual seasonality (how chickenpox cases fluctuate throughout the year).
  3. Compare trends across counties.
  4. Investigate temporal and spatial autocorrelation in chickenpox cases.

The findings of this study can help inform targeted interventions to mitigate chickenpox outbreaks in Hungary.

Data Preparation

Before we begin the analysis, we need to load and inspect the datasets. The chickenpox case data is collected weekly for each of Hungary’s 20 counties. The adjacency matrix provides the spatial relationships between counties, which is essential for spatial autocorrelation analysis.

We will load the chickenpox data and the county adjacency matrix, inspect the structure of both datasets, and perform any necessary data cleaning or transformation. # Data Preparation

Before we begin the analysis, we need to load and inspect the datasets. The chickenpox case data is collected weekly for each of Hungary’s 20 counties. The adjacency matrix provides the spatial relationships between counties, which is essential for spatial autocorrelation analysis.

We will load the chickenpox data and the county adjacency matrix, inspect the structure of both datasets, and perform any necessary data cleaning or transformation.

# Load datasets
chickenpox_data <- read.csv("hungary_chickenpox.csv")
county_edges <- read.csv("hungary_county_edges.csv")

# Inspect datasets
head(chickenpox_data)
##         Date BUDAPEST BARANYA BACS BEKES BORSOD CSONGRAD FEJER GYOR HAJDU HEVES
## 1 03/01/2005      168      79   30   173    169       42   136  120   162    36
## 2 10/01/2005      157      60   30    92    200       53    51   70    84    28
## 3 17/01/2005       96      44   31    86     93       30    93   84   191    51
## 4 24/01/2005      163      49   43   126     46       39    52  114   107    42
## 5 31/01/2005      122      78   53    87    103       34    95  131   172    40
## 6 07/02/2005      174      76   77   152    189       26    74  181   157    44
##   JASZ KOMAROM NOGRAD PEST SOMOGY SZABOLCS TOLNA VAS VESZPREM ZALA
## 1  130      57      2  178     66       64    11  29       87   68
## 2   80      50     29  141     48       29    58  53       68   26
## 3   64      46      4  157     33       33    24  18       62   44
## 4   63      54     14  107     66       50    25  21       43   31
## 5   61      49     11  124     63       56     7  47       85   60
## 6   95      97     26  146     59       54    27  54       48   60
head(county_edges)
##   name_1   name_2 id_1 id_2
## 1   BACS     JASZ    0   10
## 2   BACS     BACS    0    0
## 3   BACS  BARANYA    0    1
## 4   BACS CSONGRAD    0    5
## 5   BACS     PEST    0   13
## 6   BACS    FEJER    0    6
# Check structure of the datasets
str(chickenpox_data)
## 'data.frame':    522 obs. of  21 variables:
##  $ Date    : chr  "03/01/2005" "10/01/2005" "17/01/2005" "24/01/2005" ...
##  $ BUDAPEST: int  168 157 96 163 122 174 153 115 119 114 ...
##  $ BARANYA : int  79 60 44 49 78 76 103 74 86 81 ...
##  $ BACS    : int  30 30 31 43 53 77 54 64 57 129 ...
##  $ BEKES   : int  173 92 86 126 87 152 192 174 171 217 ...
##  $ BORSOD  : int  169 200 93 46 103 189 148 140 90 167 ...
##  $ CSONGRAD: int  42 53 30 39 34 26 65 56 65 64 ...
##  $ FEJER   : int  136 51 93 52 95 74 100 111 118 93 ...
##  $ GYOR    : int  120 70 84 114 131 181 118 175 105 154 ...
##  $ HAJDU   : int  162 84 191 107 172 157 129 138 194 119 ...
##  $ HEVES   : int  36 28 51 42 40 44 40 60 60 34 ...
##  $ JASZ    : int  130 80 64 63 61 95 88 112 67 118 ...
##  $ KOMAROM : int  57 50 46 54 49 97 56 70 46 73 ...
##  $ NOGRAD  : int  2 29 4 14 11 26 10 21 12 6 ...
##  $ PEST    : int  178 141 157 107 124 146 119 178 112 130 ...
##  $ SOMOGY  : int  66 48 33 66 63 59 104 70 116 68 ...
##  $ SZABOLCS: int  64 29 33 50 56 54 85 75 76 59 ...
##  $ TOLNA   : int  11 58 24 25 7 27 20 5 22 31 ...
##  $ VAS     : int  29 53 18 21 47 54 32 66 45 85 ...
##  $ VESZPREM: int  87 68 62 43 85 48 153 149 102 96 ...
##  $ ZALA    : int  68 26 44 31 60 60 70 54 42 54 ...
str(county_edges)
## 'data.frame':    102 obs. of  4 variables:
##  $ name_1: chr  "BACS" "BACS" "BACS" "BACS" ...
##  $ name_2: chr  "JASZ" "BACS" "BARANYA" "CSONGRAD" ...
##  $ id_1  : int  0 0 0 0 0 0 0 1 1 1 ...
##  $ id_2  : int  10 0 1 5 13 6 16 1 16 14 ...
summary(chickenpox_data)
##      Date              BUDAPEST         BARANYA           BACS       
##  Length:522         Min.   :  0.00   Min.   :  0.0   Min.   :  0.00  
##  Class :character   1st Qu.: 34.25   1st Qu.:  8.0   1st Qu.:  8.00  
##  Mode  :character   Median : 93.00   Median : 25.0   Median : 29.50  
##                     Mean   :101.25   Mean   : 34.2   Mean   : 37.17  
##                     3rd Qu.:149.00   3rd Qu.: 51.0   3rd Qu.: 53.00  
##                     Max.   :479.00   Max.   :194.0   Max.   :274.00  
##      BEKES            BORSOD          CSONGRAD          FEJER       
##  Min.   :  0.00   Min.   :  0.00   Min.   :  0.00   Min.   :  0.00  
##  1st Qu.:  4.00   1st Qu.: 14.25   1st Qu.:  6.00   1st Qu.:  7.00  
##  Median : 14.00   Median : 46.50   Median : 20.50   Median : 24.00  
##  Mean   : 28.91   Mean   : 57.08   Mean   : 31.49   Mean   : 33.27  
##  3rd Qu.: 38.75   3rd Qu.: 83.75   3rd Qu.: 47.00   3rd Qu.: 51.75  
##  Max.   :271.00   Max.   :355.00   Max.   :199.00   Max.   :164.00  
##       GYOR            HAJDU           HEVES             JASZ       
##  Min.   :  0.00   Min.   :  0.0   Min.   :  0.00   Min.   :  0.00  
##  1st Qu.:  9.00   1st Qu.: 11.0   1st Qu.:  6.25   1st Qu.: 10.00  
##  Median : 35.00   Median : 37.0   Median : 21.00   Median : 31.00  
##  Mean   : 41.44   Mean   : 47.1   Mean   : 29.69   Mean   : 40.87  
##  3rd Qu.: 63.00   3rd Qu.: 68.0   3rd Qu.: 41.00   3rd Qu.: 61.75  
##  Max.   :181.00   Max.   :262.0   Max.   :210.00   Max.   :224.00  
##     KOMAROM           NOGRAD            PEST            SOMOGY      
##  Min.   :  0.00   Min.   :  0.00   Min.   :  0.00   Min.   :  0.00  
##  1st Qu.:  6.00   1st Qu.:  4.00   1st Qu.: 28.25   1st Qu.:  6.00  
##  Median : 19.00   Median : 15.00   Median : 81.00   Median : 20.50  
##  Mean   : 25.64   Mean   : 21.85   Mean   : 86.10   Mean   : 27.61  
##  3rd Qu.: 39.00   3rd Qu.: 32.75   3rd Qu.:129.75   3rd Qu.: 41.00  
##  Max.   :160.00   Max.   :112.00   Max.   :431.00   Max.   :155.00  
##     SZABOLCS          TOLNA             VAS            VESZPREM     
##  Min.   :  0.00   Min.   :  0.00   Min.   :  0.00   Min.   :  0.00  
##  1st Qu.:  6.00   1st Qu.:  4.00   1st Qu.:  3.00   1st Qu.:  7.25  
##  Median : 18.50   Median : 12.00   Median : 13.00   Median : 32.00  
##  Mean   : 29.85   Mean   : 20.35   Mean   : 22.47   Mean   : 40.64  
##  3rd Qu.: 45.00   3rd Qu.: 29.00   3rd Qu.: 34.00   3rd Qu.: 59.00  
##  Max.   :203.00   Max.   :131.00   Max.   :141.00   Max.   :230.00  
##       ZALA       
##  Min.   :  0.00  
##  1st Qu.:  4.00  
##  Median : 13.00  
##  Mean   : 19.87  
##  3rd Qu.: 31.00  
##  Max.   :216.00
# Data cleaning (if necessary)
# Ensure Date is in proper format and handle missing values
chickenpox_data$Date <- as.Date(chickenpox_data$Date, format = "%d/%m/%Y")
chickenpox_data[is.na(chickenpox_data)] <- 0  # Replace missing values with 0 for cases

Seasonality Analysis

By examining the total cases per month across all years, we can observe if there are any seasonal fluctuations in chickenpox cases. This type of analysis helps in identifying peak periods, which could inform vaccination campaigns or other public health initiatives.

# Extract month from the Date column
national_trend$Month <- month(national_trend$Date)

# Summarize cases by month
monthly_trend <- national_trend %>%
  group_by(Month) %>%
  summarise(Total_cases = sum(Total_cases))

# Plot monthly seasonality
ggplot(monthly_trend, aes(x = Month, y = Total_cases)) +
  geom_bar(stat = "identity", fill = "orange") +
  labs(title = "Seasonality of Chickenpox Cases (2005-2015)",
       x = "Month", y = "Total Cases") +
  scale_x_continuous(breaks = 1:12, labels = month.name) +
  theme_minimal()

Interactive County-Level Comparison (Large Display)

To compare chickenpox trends interactively, we provide a large and advanced plot where users can click on the legend to isolate specific counties or hover over the data points for details.

# Select all counties to compare
all_counties <- colnames(chickenpox_data)[2:ncol(chickenpox_data)]  # Extract all county names

# Reshape data for comparison
county_comparison_all <- chickenpox_data %>%
  select(Date, all_of(all_counties)) %>%
  pivot_longer(cols = -Date, names_to = "County", values_to = "Cases")

# Create an interactive plot using plotly
plot <- ggplot(county_comparison_all, aes(x = as.Date(Date, "%d/%m/%Y"), y = Cases, color = County)) +
  geom_line() +
  labs(title = "Interactive Comparison of Chickenpox Cases by County (2005-2015)",
       x = "Date", y = "Cases") +
  theme_minimal() +
  theme(legend.position = "right", legend.title = element_text(size = 10), legend.text = element_text(size = 8))

# Convert ggplot to plotly
interactive_plot <- ggplotly(plot) %>%
  layout(
    title = list(text = "<b>Interactive Comparison of Chickenpox Cases by County (2005-2015)</b>"),
    legend = list(title = list(text = "<b>Select Counties</b>")),
    width = 950,  # Set the width of the plot
    height = 650   # Set the height of the plot
  )

# Display the interactive plot
interactive_plot

Temporal Autocorrelation Analysis

To understand how chickenpox cases are temporally correlated, we compute and visualize the autocorrelation function (ACF) for selected counties. This will help us detect patterns, such as seasonality or persistence in outbreaks.

# Select a county (e.g., BUDAPEST) for analysis
selected_county <- "BUDAPEST"

# Extract data for the selected county
county_ts <- ts(chickenpox_data[[selected_county]], frequency = 52)  # Weekly data (52 weeks/year)

# Compute and plot ACF
acf_plot <- ggAcf(county_ts, lag.max = 104) +  # Analyze up to 2 years (104 weeks)
  labs(title = paste("Temporal Autocorrelation of Chickenpox Cases in", selected_county),
       x = "Lag (weeks)", y = "ACF") +
  theme_minimal()

acf_plot

Spatial Autocorrelation Analysis

Spatial autocorrelation is assessed using Moran’s I to determine whether chickenpox cases in one county are similar to those in neighboring counties.

# Extract all unique county names
all_counties <- unique(c(county_edges$name_1, county_edges$name_2))

# Initialize a square adjacency matrix
adjacency_matrix <- matrix(0, nrow = length(all_counties), ncol = length(all_counties),
                           dimnames = list(all_counties, all_counties))

# Fill the adjacency matrix with 1 where there is an edge
for (i in 1:nrow(county_edges)) {
  row <- county_edges$name_1[i]
  col <- county_edges$name_2[i]
  adjacency_matrix[row, col] <- 1
  adjacency_matrix[col, row] <- 1  # Ensure symmetry
}

# Convert to a spatial weights list
weights <- mat2listw(adjacency_matrix, style = "W")

# Compute county-level totals for chickenpox cases
county_totals <- chickenpox_data %>%
  select(-Date) %>%
  colSums()

# Run Moran's I test
moran_test <- moran.test(county_totals, weights)

# Display Moran's I results
cat("Moran's I: ", moran_test$estimate[1], "\n")
## Moran's I:  0.2191511
cat("P-value: ", moran_test$p.value, "\n")
## P-value:  0.007661405

Weekly Heatmap of Chickenpox Cases by County (2005-2015)

Description: This script generates a heatmap to visualize the weekly distribution of chickenpox cases across the 20 counties in Hungary over a 10-year period (2005-2015). The data is first reshaped to include weekly totals per county. Each tile in the heatmap represents the total cases for a specific week and county, with color intensity indicating the number of cases. The visualization helps identify temporal and spatial patterns, such as peaks and hotspots of chickenpox outbreaks.

# Reshape data to get weekly cases for each county
heatmap_data <- chickenpox_data %>%
  select(Date, all_of(all_counties)) %>%
  pivot_longer(cols = -Date, names_to = "County", values_to = "Cases") %>%
  mutate(Week = week(as.Date(Date, "%d/%m/%Y")),
         Year = year(as.Date(Date, "%d/%m/%Y")))

# Calculate the sum of cases per week for each county
heatmap_data <- heatmap_data %>%
  group_by(Year, Week, County) %>%
  summarise(Total_cases = sum(Cases)) %>%
  ungroup()
## `summarise()` has grouped output by 'Year', 'Week'. You can override using the
## `.groups` argument.
# Create a heatmap plot
heatmap_plot <- ggplot(heatmap_data, aes(x = Week, y = County, fill = Total_cases)) +
  geom_tile() +
  scale_fill_gradient(low = "white", high = "brown") +
  labs(title = "Heatmap of Chickenpox Cases by County and Week (2005-2015)",
       x = "Week of Year", y = "County", fill = "Total Cases") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 90, hjust = 1))  # Rotate x-axis labels for clarity

# Display the heatmap
heatmap_plot

Monthly Heatmap of Chickenpox Cases by Countyc

Description: This script creates a detailed heatmap visualizing the monthly distribution of chickenpox cases across Hungary’s counties over multiple years. The data is aggregated to show total cases for each month and year, organized by county. Each subplot represents a county, with colors indicating case intensity using the “YlOrRd” palette for better distinction of peaks. This visualization highlights intra-annual seasonality and temporal patterns of chickenpox outbreaks, offering insights into seasonal trends within each county.

# Calculate total cases by year and month for each county
seasonality_data <- chickenpox_data %>%
  select(Date, all_of(all_counties)) %>%
  pivot_longer(cols = -Date, names_to = "County", values_to = "Cases") %>%
  mutate(Year = year(as.Date(Date, "%d/%m/%Y")),
         Month = month(as.Date(Date, "%d/%m/%Y"))) %>%
  group_by(Year, Month, County) %>%
  summarise(Total_cases = sum(Cases)) %>%
  ungroup()
## `summarise()` has grouped output by 'Year', 'Month'. You can override using the
## `.groups` argument.
ggplot(seasonality_data, aes(x = Month, y = Year, fill = Total_cases)) +
  geom_tile() +
  scale_fill_gradientn(colors = brewer.pal(9, "YlOrRd")) +  # Using the YlOrRd palette for the color scale
  facet_wrap(~ County, scales = "free_y") +
  labs(title = "Heatmap of Monthly Chickenpox Cases by County",
       x = "Month", y = "Year", fill = "Total Cases") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1),
        strip.text = element_text(size = 10))  # Adjust size of facet labels for better readability

Conclusion:

This study examined the spatio-temporal patterns of chickenpox cases in Hungary over a 10-year period. The analysis revealed clear trends in the spread and intensity of outbreaks, both nationally and across individual counties. Seasonal patterns were evident, with cases peaking during specific months, indicating strong intra-annual seasonality. The comparison across counties showed differences in case numbers, suggesting that some regions experienced higher or more frequent outbreaks than others.

Temporal and spatial autocorrelation analyses further highlighted how chickenpox cases were influenced by both time and location, indicating clusters of outbreaks that followed predictable patterns. These insights can support public health strategies, such as targeted vaccination campaigns, to reduce the spread of chickenpox in high-risk areas and during peak seasons. Overall, the findings emphasize the importance of combining spatial and temporal data for understanding disease dynamics and improving healthcare planning..